Search | WHO COVID-19 Research Database

Language models for the prediction of SARS-CoV-2 inhibitors

Blanchard, A. E.; Gounley, J.; Bhowmik, D.; Chandra Shekar, M.; Lyngaas, I.; Gao, S.; Yin, J.; Tsaris, A.; Wang, F.; Glaser, J..

Int J High Perform Comput Appl ; 2022.

Article in English | PubMed Central | ID: covidwho-2064608

ABSTRACT

The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.

IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads

Saadi, A. A.; Alfe, D.; Babuji, Y.; Bhati, A.; Blaiszik, B.; Brace, A.; Brettin, T.; Chard, K.; Chard, R.; Clyde, A.; Coveney, P.; Foster, I.; Gibbs, T.; Jha, S.; Keipert, K.; Kranzlmüller, D.; Kurth, T.; Lee, H.; Li, Z.; Ma, H.; Mathias, G.; Merzky, A.; Partin, A.; Ramanathan, A.; Shah, A.; Stern, A.; Stevens, R.; Tan, L.; Titov, M.; Trifan, A.; Tsaris, A.; Turilli, M.; Van Dam, H.; Wan, S.; Wifling, D.; Yin, J..

50th International Conference on Parallel Processing, ICPP 2021 ; 2021.

Article in English | Scopus | ID: covidwho-1480302

ABSTRACT

The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silico methodologies need to be improved both to select better lead compounds, so as to improve the efficiency of later stages in the drug discovery protocol, and to identify those lead compounds more quickly. No known methodological approach can deliver this combination of higher quality and speed. Here, we describe an Integrated Modeling PipEline for COVID Cure by Assessing Better LEads (IMPECCABLE) that employs multiple methodological innovations to overcome this fundamental limitation. We also describe the computational framework that we have developed to support these innovations at scale, and characterize the performance of this framework in terms of throughput, peak performance, and scientific results. We show that individual workflow components deliver 100 × to 1000 × improvement over traditional methods, and that the integration of methods, supported by scalable infrastructure, speeds up drug discovery by orders of magnitudes. IMPECCABLE has screened ∼1011 ligands and has been used to discover a promising drug candidate. These capabilities have been used by the US DOE National Virtual Biotechnology Laboratory and the EU Centre of Excellence in Computational Biomedicine. © 2021 ACM.

Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19.

Acharya, A; Agarwal, R; Baker, M B; Baudry, J; Bhowmik, D; Boehm, S; Byler, K G; Chen, S Y; Coates, L; Cooper, C J; Demerdash, O; Daidone, I; Eblen, J D; Ellingson, S; Forli, S; Glaser, J; Gumbart, J C; Gunnels, J; Hernandez, O; Irle, S; Kneller, D W; Kovalevsky, A; Larkin, J; Lawrence, T J; LeGrand, S; Liu, S-H; Mitchell, J C; Park, G; Parks, J M; Pavlova, A; Petridis, L; Poole, D; Pouchard, L; Ramanathan, A; Rogers, D M; Santos-Martins, D; Scheinberg, A; Sedova, A; Shen, Y; Smith, J C; Smith, M D; Soto, C; Tsaris, A; Thavappiragasam, M; Tillack, A F; Vermaas, J V; Vuong, V Q; Yin, J; Yoo, S; Zahran, M.

J Chem Inf Model ; 60(12): 5832-5852, 2020 12 28.

Article in English | MEDLINE | ID: covidwho-1065780

ABSTRACT

We present a supercomputer-driven pipeline for in silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking. Ensemble docking makes use of MD results by docking compound databases into representative protein binding-site conformations, thus taking into account the dynamic properties of the binding sites. We also describe preliminary results obtained for 24 systems involving eight proteins of the proteome of SARS-CoV-2. The MD involves temperature replica exchange enhanced sampling, making use of massively parallel supercomputing to quickly sample the configurational space of protein drug targets. Using the Summit supercomputer at the Oak Ridge National Laboratory, more than 1 ms of enhanced sampling MD can be generated per day. We have ensemble docked repurposing databases to 10 configurations of each of the 24 SARS-CoV-2 systems using AutoDock Vina. Comparison to experiment demonstrates remarkably high hit rates for the top scoring tranches of compounds identified by our ensemble approach. We also demonstrate that, using Autodock-GPU on Summit, it is possible to perform exhaustive docking of one billion compounds in under 24 h. Finally, we discuss preliminary results and planned improvements to the pipeline, including the use of quantum mechanical (QM), machine learning, and artificial intelligence (AI) methods to cluster MD trajectories and rescore docking poses.

Subject(s)

Antiviral Agents/chemistry , COVID-19 Drug Treatment , SARS-CoV-2/drug effects , Viral Nonstructural Proteins/chemistry , Artificial Intelligence , Binding Sites , Computer Simulation , Databases, Chemical , Drug Design , Drug Evaluation, Preclinical , Humans , Molecular Docking Simulation , Protein Conformation , Spike Glycoprotein, Coronavirus/chemistry , Structure-Activity Relationship

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL